Parallel interrogation of the chalcogenide-based micro-ring sensor array for photoacoustic tomography

Photoacoustic tomography (PAT), also known as optoacoustic tomography, is an attractive imaging modality that provides optical contrast with acoustic resolutions. Recent progress in the applications of PAT largely relies on the development and employment of ultrasound sensor arrays with many elements. Although on-chip optical ultrasound sensors have been demonstrated with high sensitivity, large bandwidth, and small size, PAT with on-chip optical ultrasound sensor arrays is rarely reported. In this work, we demonstrate PAT with a chalcogenide-based micro-ring sensor array containing 15 elements, while each element supports a bandwidth of 175 MHz (−6 dB) and a noise-equivalent pressure of 2.2 mPaHz−1/2. Moreover, by synthesizing a digital optical frequency comb (DOFC), we further develop an effective means of parallel interrogation to this sensor array. As a proof of concept, parallel interrogation with only one light source and one photoreceiver is demonstrated for PAT with this sensor array, providing images of fast-moving objects, leaf veins, and live zebrafish. The superior performance of the chalcogenide-based micro-ring sensor array and the effectiveness of the DOFC-enabled parallel interrogation offer great prospects for advancing applications in PAT.

In their work, the authors demonstrate a linear array of ultrasound detector in which the individual elements are microrings fabricated in chalcogenide glass. The authors report high sensitivity and bandwidth and demonstrate their system for imaging a leaf and a zebra fish. The work is technically sound and the challenge of developing arrays of optical detectors is very important to the fields of ultrasound and photoacoustics. However, there are some issues with the novelty of the work and the way it is presented. I believe major modifications need to be made to the manuscript before it could be considered for publication in Nature Communications: 1) It is not clear what the main novelty of the work. The type of resolutions reported in the work have been achieved by the UCL group using Fabry-Perots, where parallel interrogation was reported for up to 16 channels. PDMS-coated resonators have been reported in silicon by Hazan et al (Ref. 19) and the frequency-comb interrogation scheme has already been published. The authors need to better pinpoint the innovation of the technology, in terms unique performance and/or some missing component that didn't exist previously that allowed them to combine all these techniques into a single system.
2) The authors claim that using a single coherent detector is a much better option than using multiple photodiodes, as done in other groups. This argument has two flaws. First, photodiodes are cheap, and it is not obvious to me that a single coherent detector is cheaper than 10 photodiodes. Second, the true cost is not in the photodiodes, which are cheap components, but in the sampling electronics. The authors fail to mention that the sampling BW of their system much be much higher than the acoustic BW since they measure the interference between all the resonators. The cost and complexity of sampling systems is not just in the number of channels, but also in the total bit/s data rate, which would be higher in the proposed scheme because not all the BW is utilized.
3) The method strongly relies on tuning the microrings' spectrum to fit that of the frequency comb. However, it is well known that there could be drifts in photosensitivity-induced structures over time. In addition, temperature variations could also scramble the microring spectrum. Could the author remeasure the spectrum of the micro-ring array now and show if any differences have been observed? Also, the authors should perform a temperature-dependent measurement and comment on how stable the micro-ring comb spectrum is over a normal temperature range (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30). 4) More information should be given on the coherence detection scheme to explain how the raw signal from a single coherent detector is transformed into an array. This should include showing the signals at each step of the process.
5) The authors claim that the PDMS layer was used only to protect the micro-ring and that the signal is a result of the high photo-elastic coefficient of the glass. However, this contradicts the conclusion of Hazan et al. in silicon, where it was experimentally shown that the signal actually comes from the PDMS. The authors should substantiate their claim by using harder coating materials (e.g. normal silica, or may not need such a high-frequency bandwidth for imaging thick biological samples. A detailed discussion on this issue may be helpful to researchers in the field to design their optical ultrasound sensors in the future. 7. What is the demodulation speed of the interrogation means? Can it reach 10 Hz to be consistent with the repetition rate of the pulse laser source in photoacoustic tomography? 8. Linear array may not be the optimal choice for single-shot photoacoustic tomography, as it suffers from the problem of limited view. This problem was alleviated by scanning the samples in the paper but at the cost of multiple shots. I suggest the authors consider the possibility to extend this sensor array into a two-dimensional structure. What are the possible challenges to making a two-dimensional sensor array? Can we use the same interrogation means with one coupling waveguide? What is the limiting factor that determines the maximum number of elements? 9. What is the stability of this sensor array? Since the authors use visible light to tune the resonant frequencies of each microring sensor, I suspect the sensor array may be susceptible to the environment, such as temperature, humidity, and ambient light illumination. The authors should quantify this effect and provide more details on this point. 10. From the perspective of design, would the authors be able to comment on the scalability of this device? In theory, what would be the maximum number of microrings based on the current configuration? Would it be possible to extend this technique to fabricate a 2D array?
In conclusion, the paper presents an impressive step forward, but its value to the community could be enhanced by clearer presenting data. In their work, the authors demonstrate a linear array of ultrasound detector in which the individual elements are microrings fabricated in chalcogenide glass. The authors report high sensitivity and bandwidth and demonstrate their system for imaging a leaf and a zebra fish. The work is technically sound and the challenge of developing arrays of optical detectors is very important to the fields of ultrasound and photoacoustics. However, there are some issues with the novelty of the work and the way it is presented. I believe major modifications need to be made to the manuscript before it could be considered for publication in Nature Communications:

It is not clear what the main novelty of the work. The type of resolutions reported in the work have been achieved by the UCL group using Fabry-Perots, where parallel interrogation was
reported for up to 16 channels. PDMS-coated resonators have been reported in silicon by Hazan et al (Ref. 19) and the frequency-comb interrogation scheme has already been published. The authors need to better pinpoint the innovation of the technology, in terms unique performance and/or some missing component that didn't exist previously that allowed them to combine all these techniques into a single system.

Response:
We thank the reviewer for his suggestion to better pinpoint the innovation of our work.
The main novelty of our work includes two parts. First, we demonstrated an on-chip chalcogenidebased micro-ring sensor array with 15 elements for photoacoustic tomography. These micro-ring sensors show high quality factors (10 5 ), large bandwidth (175 MHz), and low noise-equivalent pressure (2.2 mPaHz -1/2 ). These values are comparable to those of state-of-the-art optical ultrasound sensors. Second, we developed an interrogation means for the sensor array by synthesizing a digital optical frequency comb (DOFC), which simplifies the imaging setup by employing only one continuous-wave light source and one coherent photodetector. Since on-chip micro-rings with highquality factors generally exhibit resonant dips with narrow linewidth, the unique property of the DOFC with a tunable and ultra-narrow comb tooth is well suited for measuring the transmission spectrum of the sensor array with high accuracy. The complete knowledge of the transmission spectrum allows the acoustic signals measured by all micro-ring sensors to be acquired simultaneously. Therefore, the marriage of these two technical innovations breeds a compact imaging system of photoacoustic tomography. To emphasize the novelty of our work, in the revised main text, we modified the description in the section "Introduction" and emphasize the innovation of this work: a. In contrast to optically generated frequency combs, the DOFC holds the unique advantage of generating an ultra-narrow and tunable comb tooth. Since on-chip micro-rings with high-quality factors generally exhibit resonant dips with narrow linewidth, this unique property of the DOFC is well suited to locate resonant frequencies for all micro-ring sensors in parallel with high accuracy.
b. These results indicate that the marriage of these two technical innovations, i.e., highperformance micro-ring sensor array and DOFC-enabled parallel interrogation means, breeds a compact imaging system of PAT using an ultrasound optical sensor array, offering great prospects for clinical applications.
2. The authors claim that using a single coherent detector is a much better option than using multiple photodiodes, as done in other groups. This argument has two flaws. First, photodiodes are cheap, and it is not obvious to me that a single coherent detector is cheaper than 10 photodiodes.
Second, the true cost is not in the photodiodes, which are cheap components, but in the sampling electronics. The authors fail to mention that the sampling BW of their system much be much higher than the acoustic BW since they measure the interference between all the resonators. The cost and complexity of sampling systems is not just in the number of channels, but also in the total bit/s data rate, which would be higher in the proposed scheme because not all the BW is utilized.

Response:
We thank the reviewer for pointing this out. We did not claim the developed means of parallel interrogation used a cheaper device than the 10 photodiodes in the manuscript. The benefit of using a single coherent detector in terms of using multiple photodiodes is for the simplification and compactness of the imaging system. Nonetheless, we agree with the authors that we indeed used a much higher sampling bandwidth to realize parallel interrogation. This is natural as we need to collect more information (from all 15 micro-rings) within the same amount of time. To avoid confusing potential readers, we added the following sentences in the section "Method" of the revised main text to discuss this issue, which explicitly shows that the sampling bandwidth of our system is higher: We also note that the employment of the DOFC does not reduce the amount of data required for image reconstruction. In this condition, the coherent receiver is expected to have a much larger bandwidth than a standard photodiode. Moreover, extracting the entire transmission spectrum further sacrifices the bandwidth to some extent. As a result, the developed means of parallel interrogation essentially trades the bandwidth of the detection for the compactness of the imaging system.

The method strongly relies on tuning the microrings' spectrum to fit that of the frequency comb.
However, it is well known that there could be drifts in photosensitivity-induced structures over time.
In addition, temperature variations could also scramble the microring spectrum. Could the author remeasure the spectrum of the micro-ring array now and show if any differences have been observed? Also, the authors should perform a temperature-dependent measurement and comment on how stable the micro-ring comb spectrum is over a normal temperature range (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30).

Response:
We thank the reviewer for asking this important question. The spectrum of the sensor array can be quite stable for several hours, which is sufficiently long to perform PAT. On one hand, we also found that temperature change in water causes resonant frequencies of all micro-ring sensors to shift in the same direction with roughly the same amount, indicating that temperature fluctuations in the water do not scramble the spectrum. On the other hand, we further found that temperature fluctuations in the room do not considerably affect the spectrum of the sensor array. The data that support the above claims, i.e., spectrum stability as a function of time and temperature variation, are supplemented in the revised submission. In particular, in the revised main text, we added the following sentence in the section "Frequency tuning of the micro-ring sensor array": These resonant dips can stay well resolved within 6 hours, which is detailed in Supplementary Note 13.
In the revised Supplementary Information, we also added a new section "Supplementary Note 13" to describe the stability issue in terms of both time and temperature.
Supplementary Note 13. The examination of the spectrum stability of the micro-ring sensor array The DOFC method strongly relies on the delicate tuning of the sensor spectrum. In this section, we examined the spectrum stability of the micro-ring sensor array. Firstly, we measured the transmission spectrum of the sensor array over time for 6 hours, which is shown in Supplementary  Fig. 12. As we can see from the figure, the photo-sensitive effect of the material did cause observable resonant frequency shifts in the spectrum. Nonetheless, all 15 resonant dips are still well-resolvable, allowing sustainable imaging operation using the sensor array. For comparison purposes, grey dashed lines were used to denote the position of the original resonant frequencies. Quantitatively, the mean absolute frequency drifts of the 15 resonant frequencies after 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, and 6 hours are 0.319 GHz, 0.451 GHz, 0.451 GHz, 0.312 GHz, 0.401 GHz, and 0.375 GHz, respectively. Correspondingly, the standard deviation of the frequency drifts of the 15 resonant frequencies after 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, and 6 hours are 0.067 GHz, 0.267 GHz, 0.267 GHz, 0.188 GHz, 0.309 GHz, and 0.208 GHz, respectively. All these values are much smaller than the average separation of adjacent resonant dips (1.66 GHz). These results show that the micro-ring sensor array can still function normally even after being placed in the aqueous environment for 6 hours. In addition, the temperature change of water could also resonant frequency shifts. To examine this issue, we varied the temperature of the aqueous environment within a range from 25 -40 ℃ by using a heating pad. Notably, it was observed that all 15 sensors in the array exhibited roughly the same frequency drifts along the same direction. This observation indicates that the variation in the temperature of the water does not scramble the spectrum of the sensor array. The mean resonant frequency shifts for these 15 resonant frequencies are plotted in Supplementary Fig. 13 as a function of temperature using blue dots, while their standard deviations are represented using red circles. As we can see from the figure, the frequency drifts increase linearly with the increased temperature. Through a linear fitting, the measured data results in a thermo-optic coefficient of about 25.9 pm/℃ 0h 1h 2h 3h 4h 5h 6h (a) Then, we swept the transmission spectrum to locate the positions of the resonant frequencies where i is the labeling for micro-ring sensors. Taking the sensor array with 15 elements as an example, = 1,2,3, ⋯ ,15. In the absence of the ultrasonic wave, we recorded (0) as the original resonant frequency. When the ultrasonic wave interacts with the sensor array, all ( ) start to change with time. In this condition, we define the time-dependent amplitude of the ultrasonic wave received by the i-th element PA ( ) as (S8) Thus, aided by the transmission spectrum measured through the DOFC, we can determine the timedependent amplitude of the ultrasonic wave from all micro-ring sensors in parallel, without the need to lock single-wavelength lasers to the resonant frequencies as the conventional methods did. the SiO2-based micro-ring is much harder than the PDMS cladding, thus the signal comes from the PDMS. The situation is completely different in our work, where the micro-ring was made using soft chalcogenide material. To validate this conclusion by examining whether the PDMS cladding contributes to the measured signals, we performed additional experiments as suggested by the reviewer. In particular, we compared the performances of micro-ring sensors without cladding and with PDMS or SiO2 claddings. The comparison shows that most of the signals are indeed contributed from chalcogenide-based micro-ring structures while the contribution from the elastic deformation in PDMS is small. To illustrate this point, in the revised submission, we added the following sentences in the section "Structure of the sensor array" of the revised main text: A previous study showed that, in the architecture of silicon photonics, deformation in the PDMS cladding can contribute to the measured signals [19]. In Supplementary Note 5, we tested this effect with different types of cladding materials and found that the measured signals in this work are mainly contributed by chalcogenide-based micro-ring structures rather than PDMS claddings.
In the revised supplementary Information, we also added a new section "Supplementary Note 5" to compare the performance of micro-ring sensors with different claddings and commented on the difference between our work and the work reported by Hazan et al.

Supplementary Note 5. Effects of claddings on chalcogenide-based micro-ring sensors
In this section, we describe the effects of claddings on the performance of chalcogenide-based micro-ring sensors. As a fair comparison, we chose three micro-ring sensors with roughly the same quality factors around 5×10 5 , and encapsulated them with different claddings. As the control group, one micro-ring sensor has no cladding, which is referred to as the null case. The other two sensors were encapsulated with 3-μm-thick claddings using either polydimethylsiloxane (PDMS) or silicon dioxide (SiO2). Using the same optical system and the characterization procedure described in the main text to produce Fig. 7(d), the amplitude maps of the measured signals as a function of time and translational distance are illustrated in Supplementary Fig. 4. As we can see from the figure, different claddings do not induce considerable differences in the amplitude maps and no strong surface acoustic wave is observed. Quantitatively, we found the peak values in these three cases are similar, indicating that signals are indeed contributed by the chalcogenide-based micro-ring structures.
Therefore, we conclude that for micro-ring sensors made of soft chalcogenide-based material, the choices of cladding material do not considerably affect the performance of the sensor. It is worth noting that this observation is different from the one reported in Ref. [3] in which the sensor structure was made of a hard material SiO2. In that case, the deformation in relatively soft PDMS cladding contributes to the measured signal.   Biomed. Eng. 64, 4-15 (2017)."

Response:
We thank the reviewer for making this valuable suggestion. We performed the characterization experiments again for the micro-ring sensor and show both the time and frequency responses at several angles of 10°, 20°, 30°, and 40°. A numerical investigation based on the theoretical framework described in the suggested reference was also provided. All new results were supplemented in the revised submission. In particular, in the section "Characterization of the micro-ring sensor" of the revised main text, we updated Fig. 7(e) and its corresponding caption as follows. In the revised supplementary Information, we also added a new section "Supplementary Note 9" to describe detailed procedures for numerical estimations on the acceptance angle. Detailed time and frequency responses at several angles of 10°, 20°, 30°, and 40° were also provided.

Supplementary Note 9. Acceptance angles for the micro-ring sensor
Optical ultrasound sensors generally support wide acceptance angles. In this section, we detailed the characterization process for the acceptance angle of the micro-ring sensor. The experimental setup has been shown in Fig. 7(a) of the main text. In the beginning, the relative distance between the ultrasonic source and the micro-ring sensor was about 5 mm, and the time and  30°, and 40°, respectively. Given that the 3-dB bandwidth at 0° is 115 MHz, we notice that the decreasing rate for the bandwidth of the sensor decays is fast for small angles and becomes slow for large angles. We also numerically investigated the theoretical angular response of the micro-ring sensor. In particular, we followed the procedure described in Ref.
[9] to examine the spatial distribution of ultrasonic detection. To comply with the parameters used in experiments, the diameter of the microring sensor was set to 40 μm and the relative distance between the ultrasonic point source and the sensor at the beginning was 5 mm during simulations. The frequency response of the micro-ring sensor as a function of the acceptance angle is shown in Supplementary Fig. 8. Two red -3 dB lines are also provided for visualization purposes. This figure confirms the experimental results presented in Fig. 7(e) show similar trends to the theoretical ones. 7. The authors should generally explain why their resolution is not higher (Fig. 4). Generally

speaking, a 40-micron microring should give a 40-micro lateral resolution. Why is that not the case here? Also the axial resolution is lower than what I would expect for a 175 MHz bandwidth.
Response: We thank the reviewer for asking this important question about imaging resolutions.
Since the micro-ring sensor was placed at a distance from the carbon fiber and the sensor was also linearly translated with a very fine step size of 0.7 μm, the 40-μm diameter of the micro-ring sensor is not directly related to the lateral resolution of the imaging system. Instead, the lateral resolution should depend on the central frequency of the received acoustic signals. As for the axial resolution, the reviewer is correct that our measured value is lower than what can be achieved for this bandwidth.
To address this issue, we performed experiments again to characterize the imaging resolutions with a much finer step size to avoid the effect of pixelation. In the revised main text, we updated these values in the section "Characterization of the micro-ring sensor" and discuss the discrepancy between the measured values and the theoretically estimated ones: Theoretically, for partial view PAT, the lateral resolution is given by 0.71 v/(NAf0) ≈ 35.5 μm [49]. Here, v = 1,500 mm/ms is the speed of sound, NA ≈sin (30 o ) = 0.5 is the numerical aperture estimated using the acceptance angle of the micro-ring sensor, and f0 = 60 MHz is the central frequency. The small discrepancy between the estimated value (35.5 μm) and the measured one (50.4 μm) may be due to the inaccurate estimation of the numerical aperture. Moreover, the theoretically estimated axial resolution is given by 0.88v/Δf ≈ 11.5 μm, where Δf = 115 MHz is the 3-dB bandwidth [49]. This value (11.5 μm) is also smaller than the measured one (18.9 μm), which is likely due to high-frequency components attenuating much more than low-frequency counterparts in agar that covers the carbon fiber. Fig. 4h Fig. 9), which has also been verified by Hazan et al. 9. Using a rotation-based imaging system defies the logic of using optical sensors. As has been shown the UCL group, optical sensors have the advantage of working in a planar geometry. The authors need to show that their technique can do the same and still get good images, even if some Response: We thank the reviewer for his comments on performing imaging experiments in the planar geometry. Per the reviewer's suggestion, instead of rotating the sample, we performed additional imaging experiments in the planar geometry. In particular, we linearly scanned the sensor array and successfully demonstrated imaging three interleaved black hairs. In the section "Experimental result on imaging biological samples" of the revised main text, we added the following paragraph to illustrate the imaging results obtained in the planar geometry:

The PSF shown in
Optical Sensors have the advantage of working in the planar geometry [41]. Here, instead of rotating samples, we show that our technique can function in the planar geometry as well. As shown in Fig. 8(a) (camera-captured image), three interleaved black hairs are chosen as the imaging target and are buried inside tissue-mimicking phantoms. Since 15 elements are not enough to suppress reconstruction artifacts in PAT, we linearly scanned the sensor array within a ± 8-mm range and with a 20-μm step size. The reconstructed image is shown in Fig. 8

In Figs. 7a-d, it is not clear what the detection geometry was.
Where the detector in-plane or arranged vertically. It feels like the benefit of using more detectors was not just from SNR, but also from having more projections. I suspect that the 1-sensor measurement would still not be as good as the one from the array even if the authors averaged. A fair comparison with the same number of signals is needed to answer this question.

Response:
We thank the reviewer for asking questions regarding the detection geometry and different projections for elements in the sensor array. The first question the reviewer asks is about the detection geometry of the leaf vein. In this work, the detection geometry for the leaf vein and the zebrafish is the same, in which the target biological sample and the sensor array lie in the same plane. To clarify this point, in the revised main text, we added the following sentence in the section "Experimental result on imaging biological samples": Both the biological sample and the microring sensor array lie in the same horizontal plane, which is detailed and sketched in Supplementary Note 1.
In the revised Supplementary Information, we modified the section "Supplementary Note 1" to illustrate detailed imaging geometry for biological samples.

Supplementary Note 1. Detailed imaging procedures for biological samples
The experimental setup for imaging biological tissue is schematically shown in Supplementary  Fig. 1. The light source was chosen as a 532-nm laser (Beamtech, Dawa 100) with a pulse width of 6.5 ns and a repetition rate of 10 Hz. An optical diffuser (Thorlabs, DG10-120) was used to expand and homogenize illuminating light. The light illuminated the sample from the top with an area of about 8 mm in diameter. To mitigate artifacts due to the limited view, the water tank was mounted on a motorized rotational stage, which rotated with a step size of 1 degree. The sensor array was hung in water at the same horizontal plane and was about 4 cm away from the edge of the sample. In this condition, photoacoustic tomography (PAT) was performed by rotating the zebrafish while keeping the sensor array and excitation light fixed. When the pulsed light irradiated the sample, each micro-ring sensor produced an A-line signal, which contains 4,096 data points. After rotating 360 degrees, universal back projection [1] was employed to reconstruct the image through these 360 sets of A-line signals. To acquire one image, the scanning process took about 36 seconds (360 optical pulses). A microcontroller (STMicroelectronics, STM32) was employed to synchronize the motorized rotational stage, the laser, and the data acquisition process. The second question the reviewer asks is about the difference in projections for different micro-ring sensors. In our experimental setup, the biological samples, including both the leaf and zebrafish, are in centimeter scales. Since the center-to-center distance between different micro-ring sensors is only 400 μm and the sample was rotated during experiments, all micro-ring sensors, regardless of the ones at the edge or at the center, share similar views to the sample. Thus, we could hardly discern any considerable difference in the images reconstructed by different single micro-ring sensors. To clarify this point, in the revised main text, we added the following sentence in the section "Experimental result on imaging biological samples": As a comparison, the reconstructed image using a typical micro-ring sensor is shown in Fig. 3(b). More reconstructed images using other single micro-ring sensors are provided and compared in Supplementary Note 3, exhibiting similar performance in such a rotation-based detection geometry.
In the revised supplementary Information, we also added a new section "Supplementary Note 3" to illustrate reconstructed images of the leaf vein using different 1-sensor measurements.

Supplementary Note 3. Imaging reconstruction of the leaf vein with 1-sensor measurement
The reconstructed image in Fig. 3(b) of the main text was obtained using the 8 th sensor in the micro-ring sensor array, which is the one at the center. Normally, different elements in the sensor array maintain different views of the sample. In this condition, a coherent summation of the information captured through these elements can improve the image quality by reducing artifacts due to a limited view. Nonetheless, since the center-to-center distance between different micro-ring sensors is only 400 μm and the sample was rotated during experiments, all micro-ring sensors, regardless of the ones at the edge or at the center, share roughly the same views of the sample. This fact can be validated by examining the reconstructed images of the leaf vein achieved through different 1-sensor measurements. In particular, Supplementary Figs. 3(a), (b), (c), and (d) show the reconstructed images using only the 1 st , the 4 th , the 12 th , and the 15 th element, respectively. As shown in the figure, we could hardly see any considerable difference in the images reconstructed by different single micro-ring sensors. Compared with the one shown in the main text, the fluctuations in terms of the contrast-to-noise ratio of these images were within 7%, which we believe is most likely due to the variation in the sensitivity of different sensing elements. Although different sensor elements in the rotation geometry share roughly the same view of the sample, the condition is completely different when performing imaging in the planar geometry. Min.
In other words, using more sensor elements in the planar geometry generally benefits from having different projections, as pointed out by the reviewer. To illustrate this point, in the section "Experimental result on imaging biological samples" of the revised main text, we added the following paragraph to show the difference in projections of different sensor elements: Optical Sensors have the advantage of working in the planar geometry [41]. Here, instead of rotating samples, we show that our technique can function in the planar geometry as well. As shown in Fig.   8(a) (camera-captured image), three interleaved black hairs are chosen as the imaging target and are buried inside tissue-mimicking phantoms. Since 15 elements are not enough to suppress reconstruction artifacts in PAT, we linearly scanned the sensor array within a ± 8-mm range and with a 20-μm step size. The reconstructed image is shown in Fig. 8

It is not clear what the imaging geometry of the zebrafish was.
Response: We thank the reviewer for asking this question and allowing us to clarify this issue. The imaging geometry of the zebrafish is the same as that of the leaf vein. To clarify this point, in the revised main text, we added the following sentence in the section "Experimental result on imaging biological samples": Both the biological sample and the micro-ring sensor array lie in the same horizontal plane, which is detailed and sketched in Supplementary Note 1.
In the revised Supplementary Information, we modified the section "Supplementary Note 1" to illustrate detailed imaging geometry for biological samples.
Scanning direction

Supplementary Note 1. Detailed imaging procedures for biological samples
The experimental setup for imaging biological tissue is schematically shown in Supplementary   Fig. 1. The light source was chosen as a 532-nm laser (Beamtech, Dawa 100) with a pulse width of 6.5 ns and a repetition rate of 10 Hz. An optical diffuser (Thorlabs, DG10-120) was used to expand and homogenize illuminating light. The light illuminated the sample from the top with an area of about 8 mm in diameter. To mitigate artifacts due to the limited view, the water tank was mounted on a motorized rotational stage, which rotated with a step size of 1 degree. The sensor array was hung in water at the same horizontal plane and was about 4 cm away from the edge of the sample.
In this condition, photoacoustic tomography (

What was the fiber-to-fiber insertion loss of the system? Could the authors elaborate on their
fiber-to-chip bonding procedure?
Response: We thank the reviewer for asking this technical question. We supplemented the information regarding the fiber-to-chip insertion loss of the system in the section "Structure of the sensor array" of the revised main text: Currently, the fiber-to-waveguide insertion loss was estimated at around 6 dB for each side, indicating a total 12-dB insertion loss for the entire sensor In the revised Supplementary Information, we also added a new section "Supplementary Note 6" to describe detailed procedures for fiber-to-chip bonding.

Supplementary Note 6. The procedures for fiber-to-chip bonding
In this section, we describe the procedures for fiber-to-chip bonding. The sensor chip was placed at a coupling platform, which was monitored by a microscope. The magnification of the eyepiece and objective lenses of the microscope were chosen to be 12× and 20×, respectively. Based on the shape and size of the propagating mode inside the bus waveguide, we chose to use a singlemode fiber with a mode fiber diameter of 3.2 ± 0.3 μm at 1550 nm (Nufern UHNA7, Coherent).
Aided by the microscope, both ends of the bus waveguide were roughly aligned to the aforementioned type of single-mode fibers, as shown in Supplementary Fig. 5. This procedure was accomplished by adjusting the three-axis high-precision translation stage (MAX311D/M, Thorlabs) underneath the fiber. Note that a small gap was left between the waveguide and fiber for fine-tuning afterward.
Supplementary Fig. 5. A schematic illustration of the fiber-to-chip bonding structure.
Then, the input side of the fiber was connected to a continuous-wave laser (Keysight, 8164B, 10-kHz linewidth, 1,550 nm), while the output side of the fiber was attached to an optical power meter (PM100D, Thorlabs) for fine-tuning. The two three-axis stages that support the two fibers were adjusted consecutively. When gradually decreasing the gap between the fiber and the waveguide along the z direction, the positions along the x and y directions were slightly adjusted to maximize the measurement of the power meter. As the fiber and the waveguide fit tightly, a small amount of ultraviolet curing adhesive (NOA61, Norland) with a refractive index of 1.56 was dripped at the connection point. An ultraviolet curing lamp (NVSUA U365nm, Nichia) with an illumination spot of 1.5 mm and a wavelength of 365 nm was used for solidifying the curing adhesive.
Empirically, we chose an illumination time of 5 minutes and an illumination distance of 5 cm to guarantee satisfactory performance. After this fine-tuning procedure, we experimentally quantified the insertion loss between the fiber and the waveguide was about 6 dB, leading to a total insertion loss of 12 dB for the sensor chip.

Response:
We thank the reviewer for this valuable suggestion. We supplemented the information on how to produce the leaf in the revised submission. In particular, in the revised main text, we added the following sentence in the section "Experimental result on imaging biological samples": The preparation process of the leaf can be found in Supplementary Note 2.
In the revised Supplementary material, we also added a new section "Supplementary Note 2" to describe the protocol to prepare the leaf.
Supplementary Note 2. The protocol to prepare the leaf This protocol describes the detailed procedure to prepare the leaf for imaging purposes. The main goal of these procedures is to remove mesophyll through chemical corrosion using acidic or alkaline substances. The leaf veins, on the other hand, are kept and stained for imaging purposes. The protocol is illustrated as follows: 1. Preparing solution: add 20 g sodium hydroxide and 10 g sodium carbonate into 500 ml water and stir them evenly. 2. Heating solution: when the solution is about to boil, add a piece of a diamond leaf. While keeping the solution slightly boiling, heat the diamond for 5 minutes. 3. Removing mesophyll: take out the diamond leaf and put it into clear water. Use a brush to gently remove the mesophyll along the directions of the veins. Then, clean the leaf with clean water and dry the water again. 4. Bleaching: put the leaf into sodium hypochlorite solution for bleaching, wash them with clean water after bleaching, and dry the water again. 5. Staining: dilute the ink 1:1 with water and put the leaf into the solution for staining. After 3-4 minutes, take the leaf out. Rinse the leaf with clean water and dry the water. After performing the above five steps, the leaf is ready for imaging purposes. A photo of the leaf is shown in Supplementary Fig. 2.   Supplementary Fig. 2. A photo of the leaf used for photoacoustic imaging.
14. What was the fabrication procedure of the gold layer? What type of machine was used to deposit the gold layer and how long did it take? The authors could use the following paper as reference: Seeger, M., Soliman, D., Aguirre, J. et al. Pushing the boundaries of optoacoustic microscopy by total impulse response characterization. Nat Commun 11, 2910Commun 11, (2020.

Response:
We thank the reviewer for asking this question. We supplemented the information on the fabrication procedure of the gold layer in the revised submission. In particular, in the revised main text, we added the following sentence in the section "Characterization of the micro-ring sensor": … onto a 200-nm-thick golden thin film (fabrication procedure described in Supplementary Note 8) [27,30,46] In the revised Supplementary Information, we also added a new section "Supplementary Note 8" to describe the fabrication procedure.

Supplementary Note 8. The fabrication procedure for the golden layer
The fabrication procedure for the golden layer used for photoacoustic characterization is similar to that reported in Ref. [8]. The golden layer was fabricated under a high vacuum below 5×10 -9 Torr using electron beam assisted deposition (DE400DUL, Detech). The substrate was chosen as Silicon dioxide with a thickness of 1,500 μm. Then, a 5-nm layer of titanium with a purity > 99.99% (Φ60 × 2 mm, ZhongNuo Advanced Material (Beijing) Technology Co., Ltd) was deposited at 0.3 Å /s on the substrate, serving as the adhesion layer. Subsequently, a 200-nm layer of gold with a purity > 99.99% (Φ60 × 2 mm, ZhongNuo Advanced Material (Beijing) Technology Co., Ltd) was deposited at the same speed. Such a relatively low deposition speed is to guarantee the high crystalline quality of the deposited metal layers. The entire procedure took about 2 hours.

What was approximately the diameter of the focused optical beam used to generate the optoacoustic point source?
Response: We thank the reviewer for asking this impotant question. In the revised main text, we supplemented the information in the section "Characterization of the micro-ring sensor": The diameter of the focal spot was about 9 μm.

What was the laser power, and how much was coupled into the waveguide?
Response: We thank the reviewer for asking this important technical question. The laser out power was set to 10 mW and about 0.12 mW was coupled into the waveguide. In the revised main text, we supplemented this information in the section "Experimental setup for PAT using the sensor array" a. … and its output power was set to 10 mW.
b. The signal beam (~ 0.12 mW in the bus waveguide) interacted with the sensor array …

Overall, this is an excellent work. A few questions are listed below for authors' clarification:
Response: We thank the reviewer for recognizing our work.

In line 122, what is the photoelastic coefficient of the material? Only young's modulus of 31.9
GPa is given.

Response:
We thank the reviewer for asking this important question. We supplemented this information in the section "Structure of the sensor array" of the revised main text: This material has a good photoelastic property with Young's modulus of 31.9 GPa and photoelastic coefficient of about 0.238, leading to a good sensitivity to ultrasound.

The signal comes primarily from refractive index change of the microring (photoelastic effect), right? does the elastic deformation of the PDMS coating contribute to the detection?
Response: We thank the reviewer for asking this important question. Since the chalcogenide material used to make the micro-ring structure is soft, we found the signal comes primarily from the refractive index change of the micro-ring. To support this claim, we characterized the performance between micro-ring sensors with and without PDMS cladding and did not find a significant difference. Therefore, we conclude that most of the signals are indeed contributed from chalcogenide-based micro-ring structures while the contribution from the elastic deformation in PDMS is negligible. To illustrate this point, in the revised submission, we added the following sentences in the section "Structure of the sensor array": A previous study showed that, in the architecture of silicon photonics, deformation in the PDMS cladding can contribute to the measured signals [19]. In Supplementary Note 5, we tested this effect with different types of cladding materials and found that the measured signals in this work are mainly contributed by chalcogenide-based micro-ring structures rather than PDMS claddings.
In the revised supplementary Information, we also added a new section "Supplementary Note 5" to compare the performance of micro-ring sensors with different claddings and commented on the difference between our work and the work reported by Hazan et al.

Supplementary Note 5. Effects of claddings on chalcogenide-based micro-ring sensors
In this section, we describe the effects of claddings on the performance of chalcogenide-based micro-ring sensors. As a fair comparison, we chose three micro-ring sensors with roughly the same quality factors around 5×10 5 , and encapsulated them with different claddings. As the control group, one micro-ring sensor has no cladding, which is referred to as the null case. The other two sensors were encapsulated with 3-μm-thick claddings using either polydimethylsiloxane (PDMS) or silicon dioxide (SiO2). Using the same optical system and the characterization procedure described in the main text to produce Fig. 7(d), the amplitude maps of the measured signals as a function of time and translational distance are illustrated in Supplementary Fig. 4. As we can see from the figure, different claddings do not induce considerable differences in the amplitude maps and no strong surface acoustic wave is observed. Quantitatively, we found the peak values in these three cases are similar, indicating that signals are indeed contributed by the chalcogenide-based micro-ring structures.
Therefore, we conclude that for micro-ring sensors made of soft chalcogenide-based material, the choices of cladding material do not considerably affect the performance of the sensor. It is worth noting that this observation is different from the one reported in Ref. [3] in which the sensor structure was made of a hard material SiO2. In that case, the deformation in relatively soft PDMS cladding contributes to the measured signal.

In figure 2(b), is the one in the center the bus waveguide and are the 2 on the edges the microring? Why does the cross-section profile look like this? It is better to indicate the region on figure 2(a).
Response: We thank the reviewer for asking this important question. In the original Fig. 2(b), the center one is the tapered end of the bus waveguide while the other two on the edges are the unexposed film. Therefore, the way we present the bus waveguide can be quite misleading. To avoid confusion, we modified this figure in the revised main text, which shows only the cross-section of the bus waveguide in the image. We also modified the colormap of Fig. 2(a) to make it consistent with the one in Fig. 2

DOFC is described mathematically in the SI. It will be helpful to readers if the authors can provide a short intuitive understanding of its working principle in the main text. This is one of the most important contributions of the work.
Response: We thank the reviewer for this valuable suggestion. To provide an intuitive understanding of the working principle of the DOFC, we added the following sentences in the section "Experimental setup for PAT using the sensor array" of the main text: Pictorially, the DOFC exhibits a comb structure with a fine comb tooth in the frequency domain, which can be measured through a single coherent detector in the time domain. By quantifying the changes in amplitude and phase of these comb tooths, the transmission spectrum of the sensor array in the frequency domain can be accurately determined. By scrutinizing the temporal change in the transmission spectrum, time-dependent acoustic signals can be determined.

In the section of "Characterization of the micro-ring sensor", it mentions the NEP is 7.1 Pa within 20-MHz. Is it an averaged value of the 15 sensors of the array? What is the uniformity of the sensitivity and the NEP of the 15 sensors.
Response: We thank the reviewer for raising this valuable question regarding the uniformity of the sensors in the array. Due to the finite fabrication precision, these micro-ring sensors hold slightly different quality factors, leading to slightly different performances. The mentioned NEP is measured from a typical micro-ring sensor with a quality factor ranked in the middle. The sensitivities and NEPs for other sensors are within ±10% of the presented one. To clarify this issue, in the revised main text, we supplemented this information in the section "Characterization of the micro-ring sensor" as follows: a. For fair assessment, we describe the characterization process for the micro-ring sensor with the In the revised Supplementary Information, we also added a new section "Supplementary Note 11" to show the detailed quantification process of the NEP with parallel interrogation.

Supplementary Note 11. Noise-equivalent pressure for the parallel interrogation method
The parallel interrogation method certainly introduces additional noise, thus increasing the NEP of the measurement. In this section, we detail the quantification process of the NEP with the parallel interrogation method. Following the similar procedure described in Supplementary Note 10, the amplitude spectral density of the micro-ring sensor PI ( ) is shown in Supplementary Fig.  10(a). It is worth mentioning that the parallel interrogation method adopts the spectral shift (in the unit of MHz) as the indicator for the strength of ultrasound. By using the same calibrated needle hydrophone, the sensitivity of the micro-ring sensor M PI ( ) was estimated and shown in Supplementary Fig. 10(b). Similarly, the noise amplitude spectral density V PI ( ) was also quantified using the spectral shift, which is shown in Supplementary Fig. 10(c). With these parameters, the NEP spectral density Pa PI ( ) can be estimated as (S13) This parameter is shown in Supplementary Fig. 10(d). The RMS pressure within the range from 0 to 20 MHz was computed be to 36.9 Pa, which is larger than the NEP of a single micro-ring quantified above using the conventional approach.

Since the parallel interrogation is a relatively new method. It will be helpful to include some benefits/ challenges of this method.
Response: We thank the reviewer for this valuable suggestion. We added the following sentences in the section "Discussion" of the revised main text to describe the benefits brought by the parallel interrogation method: We also developed an effective means of parallel interrogation of the microring sensor array using only one source-detector pair. In contrast to previously demonstrated microring sensor arrays that typically require one source-detector pair per channel [27], the developed means of parallel interrogation can greatly simplify the system setup while speeding up the data acquisition process. Such a simplification is particularly valuable for developing head-mount imaging devices with the optical ultrasound sensor array, where both compact size and fast data acquisition are required. Moreover, the means of parallel interrogation eliminates the necessity to synchronize the signals measured by each element, which benefits the processes of data collection and image reconstruction for PAT. With these advantages, … We also added the following sentences in the section "Discussion" of the revised main text to describe the challenges the parallel interrogation method encountered: Besides the perspective of design, the practical challenge of parallel interrogation also comes from the energy loss of the sensor array. To employ the same interrogation means for the two-dimensional sensor array, a bus waveguide needs to be designed with a relatively long zigzag path to couple all micro-rings distributed in the two-dimensional plane. Given inevitable coupling loss and a 0.2-dB/cm energy loss of the bus waveguide based on the current fabrication technique, these micro-ring sensors will  In the revised Supplementary Information, we also added a new section "Supplementary Note 11" to show the detailed quantification process of the NEP with parallel interrogation.

Supplementary Note 11. Noise-equivalent pressure for the parallel interrogation method
The parallel interrogation method certainly introduces additional noise, thus increasing the NEP of the measurement. In this section, we detail the quantification process of the NEP with the parallel interrogation method. Following the similar procedure described in Supplementary Note 10, the amplitude spectral density of the micro-ring sensor PI ( ) is shown in Supplementary Fig.  10(a). It is worth mentioning that the parallel interrogation method adopts the spectral shift (in the unit of MHz) as the indicator for the strength of ultrasound. By using the same calibrated needle hydrophone, the sensitivity of the micro-ring sensor M PI ( ) was estimated and shown in Supplementary Fig. 10(b). Similarly, the noise amplitude spectral density V PI ( ) was also quantified using the spectral shift, which is shown in Supplementary Fig. 10(c). With these parameters, the NEP spectral density Pa PI ( ) can be estimated as (S13) This parameter is shown in Supplementary Fig. 10(d). The RMS pressure within the range from 0 to 20 MHz was computed be to 36.9 Pa, which is larger than the NEP of a single micro-ring quantified above using the conventional approach.  Fig. 4, the image seems to be quite pixelated. The authors should spend more effort on determining these values more accurately. Also, the vertical axis of the -6-dB bandwidth seems to be mistakenly labeled. Please correct.

Response:
We thank the reviewer for this careful check. The point spread functions provided in the original submission suffer from a strong effect of pixelation, which is due to insufficient sampling during the scanning process. To address this issue and quantify the point spread functions more accurately, we performed the experiment again by using a much finer scanning step size (0.7 μm). In

Response:
We thank the reviewer for asking this important question. the reviewer is correct that our measured values are worse than what can be estimated from the central frequency and the bandwidth.
To answer this question, we performed experiments again to characterize the imaging resolutions with a much finer step size to avoid the effect of pixelation and also discuss potential factors that may cause this discrepancy. In the revised main text, we updated these values in the section "Characterization of the micro-ring sensor" and discuss the discrepancy between the measured values and the theoretically estimated ones: Theoretically, for partial view PAT, the lateral resolution is given by 0.71 v/(NAf0) ≈ 35.5 μm [49]. Here, v = 1,500 mm/ms is the speed of sound, NA ≈ sin (30 o ) = 0.5 is the numerical aperture estimated using the acceptance angle of the micro-ring sensor, and f0 = 60 MHz is the central frequency. The small discrepancy between the estimated value (35.5 μm) and the measured one (50.4 μm) may be due to the inaccurate estimation of the numerical aperture. Moreover, the theoretically estimated axial resolution is given by 0.88v/Δf ≈ 11.5 μm, where Δf = 115 MHz is the 3-dB bandwidth [49]. This value (11.5 μm) is also smaller than the measured one (18.9 μm), which is likely due to high-frequency components attenuating much more than low-frequency counterparts in agar that covers the carbon fiber.

The comb tooth of the digital optical frequency comb is 39 MHz. The authors provide little
information on the choice of this value. Could we use a larger or smaller comb tooth? What are the possible affections on the imaging performance?
Response: We thank the reviewer for raising this important question. The choice of the comb tooth was based on the quality factor of the micro-ring sensors. To better illustrate the criterion of choosing the comb tooth, we added the following descriptions in the section "Discussion" of the revised main text: Moreover, it should be noted that the comb tooth in this study (39.0625 MHz) was chosen based on the quality factors of these micro-ring sensors. For quality factors within the range of 5~7×10 5 , the full widths at half maximum of these resonant frequencies are typically 277 ~386 MHz, meaning that 7~10 sampling points were adopted to locate the precise location of the resonant frequencies. In practice, for micro-ring sensors with small quality factors, a denser frequency comb could be used, which reduces the number of sampling points and alleviates the computational burden. In contrast, micro-ring sensors with high quality factors generally require a finer frequency comb to locate the positions of the resonant frequencies with high accuracy.

I suggest the authors estimate the moving velocity of fast-moving objects, which demonstrates
the capability of the micro-ring sensor array and its means of parallel interrogation means.

Response:
We thank the reviewer for making this valuable suggestion. We supplemented this information in the section "Experimental results on imaging fast-moving objects in a single-shot measurement" of the revised main text to demonstrate the capability of the micro-ring sensor array in imaging fast-moving objects: By quantifying the traveling distance within a given time interval, the moving speed of the microsphere was estimated to be 4.8 mm/s, which is close to the speed of flowing water in the tube.
6. The resolution of live animals does not seem to be as good as tens of micrometers. This is natural as I believe high-frequency ultrasound may not propagate well in thick scattering samples or not even be fully excited in zebrafish. If this is the case, the authors should describe this issue. In other words, we may not need such a high-frequency bandwidth for imaging thick biological samples.
A detailed discussion on this issue may be helpful to researchers in the field to design their optical ultrasound sensors in the future.

Response:
We thank the reviewer for raising this important question. We agree with the reviewer that the degrading of the resolution in the reconstructed images of zebrafish is because highfrequency components can not propagate well in thick samples. To clarify this issue, we added the following descriptions in the section "Experimental result on imaging biological samples" of the revised main text: It is also noticeable that the imaging resolutions of relatively thick biological samples do not look as good as the ones characterized above. Such an observation is likely due to the strong attenuation of high-frequency ultrasound inside thick samples. This fact indicates that for deep-tissue imaging, one may sacrifice the bandwidth of the sensor to gain benefits in other properties.

What is the demodulation speed of the interrogation means? Can it reach 10 Hz to be consistent
with the repetition rate of the pulse laser source in photoacoustic tomography?
Response: We thank the reviewer for asking this important question, which is critically important for real-time photoacoustic tomography imaging using the proposed interrogation means. As suggested by the reviewer, we supplemented this information in the section "Discussion" of the revised main text: The demodulation speed of the DOFC is another important issue that needs to be considered for real-time imaging. Currently, it took us 0.05 seconds to demodulate an acoustic signal with a length of 10 μs (large enough to cover the field of view) for the sensor array with 15 elements (Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz and 32 GB RAM). This demodulation speed can catch up with the laser repetition rate of real-time PAT (10 Hz). To incorporate more elements in the future, we may need to optimize the demodulation codes and upgrade the computational facility.
8. Linear array may not be the optimal choice for single-shot photoacoustic tomography, as it suffers from the problem of limited view. This problem was alleviated by scanning the samples in the paper but at the cost of multiple shots. I suggest the authors consider the possibility to extend this sensor array into a two-dimensional structure. What are the possible challenges to making a two-dimensional sensor array? Can we use the same interrogation means with one coupling waveguide? What is the limiting factor that determines the maximum number of elements?
Response: We thank the reviewer for raising this valuable question. We agree with the reviewer that the capability of extending the current linear array into a two-dimensional structure is of great value to photoacoustic tomography. In the revised submission, we discuss the possibility of extending the linear array into a two-dimensional structure and the potential challenges of using the same interrogation method with one coupling waveguide. In particular, we supplemented this information regarding the practical challenges of making a two-dimensional sensor array and adopting the same interrogation method in the section "Discussion" of the revised main text: Besides the perspective of design, the practical challenge of parallel interrogation also comes from the energy loss of the sensor array. To employ the same interrogation means for the two-dimensional sensor array, a bus waveguide needs to be designed with a relatively long zigzag path to couple all micro-rings distributed in the two-dimensional plane. Given inevitable coupling loss and a 0.2-dB/cm energy loss of the bus waveguide based on the current fabrication technique, these microring sensors will operate in considerably different conditions. The demodulation speed of the DOFC is another important issue that needs to be considered for real-time imaging. Currently, it took us 0.05 seconds to demodulate an acoustic signal with a length of 10 μs (large enough to cover the field of view) for the sensor array with 15 elements (Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz and 32 GB RAM), which can catch up with the laser repetition rate for real-time photoacoustic tomography (10 Hz). To incorporate more elements, we may need to optimize the demodulation codes and upgrade the computational facility.
Besides practical considerations, we also added descriptions of the possibilities of making a two-dimensional sensor array and the maximun number of allowed elements from the perspective of design in the section "Discussion" of the revised main text, which is detailed in the response to Comment 10. 9. What is the stability of this sensor array? Since the authors use visible light to tune the resonant frequencies of each microring sensor, I suspect the sensor array may be susceptible to the environment, such as temperature, humidity, and ambient light illumination. The authors should quantify this effect and provide more details on this point.

Response:
We thank the reviewer for asking this important question. The spectrum of the sensor array can be quite stable for several hours, which is sufficiently long to perform PAT. On one hand, we also found that temperature change in water causes resonant frequencies of all micro-ring sensors to shift in the same direction with roughly the same amount, indicating that temperature fluctuations in the water do not scramble the spectrum. On the other hand, we further found that temperature fluctuations in the room do not considerably affect the spectrum of the sensor array. The data that support the above claims, i.e., spectrum stability as a function of time and temperature variation, are supplemented in the revised submission. In particular, in the revised main text, we added the following sentence in the section "Frequency tuning of the micro-ring sensor array": These resonant dips can stay well resolved within 6 hours, which is detailed in Supplementary Note 13.
In the revised Supplementary Information, we also added a new section "Supplementary Note 13" to describe the stability issue in terms of both time and temperature.

Supplementary Note 13. The examination of the spectrum stability of the micro-ring sensor array
The DOFC method strongly relies on the delicate tuning of the sensor spectrum. In this section, we examined the spectrum stability of the micro-ring sensor array. Firstly, we measured the transmission spectrum of the sensor array over time for 6 hours, which is shown in Supplementary  Fig. 12. As we can see from the figure, the photo-sensitive effect of the material did cause observable resonant frequency shifts in the spectrum. Nonetheless, all 15 resonant dips are still well-resolvable, allowing sustainable imaging operation using the sensor array. For comparison purposes, grey dashed lines were used to denote the position of the original resonant frequencies. Quantitatively, the mean absolute frequency drifts of the 15 resonant frequencies after 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, and 6 hours are 0.319 GHz,0.451 GHz,0.451 GHz,0.312 GHz,0.401 GHz,and 0.375 GHz,respectively. Correspondingly, the standard deviation of the frequency drifts of the 15 resonant frequencies after 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, and 6 hours are 0.067 GHz,0.267 GHz,0.267 GHz,0.188 GHz,0.309 GHz,and 0.208 GHz,respectively. All these values are much smaller than the average separation of adjacent resonant dips (1.66 GHz). These results show that the micro-ring sensor array can still function normally even after being placed in the aqueous environment for 6 hours. In addition, the temperature change of water could also resonant frequency shifts. To examine this issue, we varied the temperature of the aqueous environment within a range from 25 -40 ℃ by using a heating pad. Notably, it was observed that all 15 sensors in the array exhibited roughly the same frequency drifts along the same direction. This observation indicates that the variation in the temperature of the water does not scramble the spectrum of the sensor array. The mean resonant frequency shifts for these 15 resonant frequencies are plotted in Supplementary Fig. 13 as a function of temperature using blue dots, while their standard deviations are represented using red circles. As we can see from the figure, the frequency drifts increase linearly with the increased temperature. Through a linear fitting, the measured data results in a thermo-optic coefficient of about 25.9 pm/℃ Response: We thank the reviewer for raising this insightful question. In theory, the maximum number of micro-rings based on the current configuration is determined by the detection bandwidth of the DOFC, the quality factor of the micro-ring sensor, and the amount of acoustic pressure induced frequency shifting. We supplemented this information and discussed the possibility of extending this technique to fabricate a two-dimensional array in the section "Discussion" of the revised main text: For example, if micro-rings with quality factors up to 2 × 10 6 can be routinely fabricated, the full width at half maximum of the resonant frequency is about 100 MHz. Considering a ±100 MHz range for acoustic pressure induced frequency shifting, each micro-ring should at least occupy a frequency bandwidth of 200 MHz in the spectrum. Thus, a 40-GHz detection bandwidth can at most accommodate 200 micro-ring sensors in theory. …, two-dimensional sensor arrays, including ring shape or bowl shape, are possible to be realized and demonstrated for PAT, … Besides theoretical considerations, we also added descriptions of practical challenges in making a two-dimensional sensor array in the revised main text, which is detailed in the response to Comment 8.
In conclusion, the paper presents an impressive step forward, but its value to the community could be enhanced by clearer presenting data. The authors have made major revisions to their manuscript and greatly improved its content. Specifically, the innovation of the current work is much better explained and many of the missing technical details are now provided. All in all, this work is excellent and represents a promising direction towards the goal of fully parallelized chip-based ultrasound detectors using optical technology. Nonetheless, there are still open questions about the specific performance of the device and its underlying mechanism, and I believe that it is essential that the authors address them before the work is accepted.
1) The authors convincingly show that the sensitivity indeed comes from the core and not cladding. However, the results raise a different question: how could this device operate with an air cladding when it is supposed to be submerged (and water absorbs light in the telecom window). My conclusion (which might be wrong) is that in this new characterization experiment, the authors positioned the chip on the water surface, where its top (cladding) was in air and its substrate (SiO2) was in the water. This means that the impinging acoustic wave came from the direction of the substrate and not the cladding. Is that true for all the measurements? The authors need to add a simple diagram showing the path travelled by the acoustic pulse until it reaches the micro-ring and explain how an air cladding was possible. The reason this is important is that normally the acoustic waves impinge directly on the micro-ring and don't need to travel through the substrate, making the authors' approach unconventional in the field and extremely interesting to other researchers. Also, is there an additional substrate on which the fibers and chip are glued, or is the whole construction held only by the adhesive between the fibers and chip?
2) In Supp. Table 1, the authors should distinguish between the performance achieved by a single detector and an array, for example the higher NEP achieved for the array and lower BW. The reduced BW should also be explicitly mentioned in the text and its value explained (why did they end up with this value and not a different one?).
3) Was the same photodetector used for the single-channel sensor and for the array? If so, what is the reason for the lower sensitivity of the array? Is it that less power was used per channel? If so, how much less? Generally, the compromises that were made to enable parallel detection need to be explained. 4) How much averaging was used in the imaging experiments? 5) Supp. Fig. 7, it would help to show also the signal for the zero angle, which will illustrate how much signal loss there is between angle zero and 10 degrees. Fig. 7h, the PSF is a 3D object with both positive and negative values and the way its currently presented in insufficient. The authors should show all 3 MAP (from all different directions) and slices over the principal axes and not clip out the negative values and artifacts because these are a natural part of any PAT system. I believe this is essential for the readers to fully understand the true nature of the device. Please do not hide the artifacts in the PSF. They are important and actually contribute to the legitimacy of your technique. If you can show similar artifact in simulation (e.g. due to limited view, or reduced BW as a function of angle) that would be even better. This is really a promising device that could inspire others in the field, and it would be very useful to understand its limitations as well.

6)
7) The authors show that the tuning of the resonance is stable for at least 6 hours and also is stable against temperature variations. That is an excellent result and shows that this approach is compatible with long imaging sessions in a lab environment. However, it is still not clear whether one would need to tune the resonators before each experiment, or could just perform the tuning as part of the fabrication process and continue using the device for days/weeks/months. This is a crucial point about the type of applications this technique is currently useful for, and the possible future need for more stable tuning procedures. Also, if the stability is not too long, it opens up a new challenge for material scientists to find solutions to the drift problem.
8) Seeing the current results, there is a very simple explanation for why the achieved resolutions are lower than the theoretical values. For the axial resolution, the formula used assumed that the BW is the same for all angles, but the detector developed in this work losses its BW at higher angles. So, one would expect some sort of average effective BW that determines that axial resolution, in addition to the explanation of acoustic attenuation. Indeed, higher axial resolutions have been achieved in the past, despite acoustic attenuation. For the lateral resolution, the authors used f=60 MHz and NA=0.5, but within the 30-degree acceptance angle only the smaller angles had such a central frequency. The average central frequency over that angular range was lower, explaining the lower resolution. A short discussion is in order. 9) Conventionally, when one wishes to quantify the resolution (or more precisely, the width of the PSF) one does that on the envelope of the reconstruction and not only on the positive values. The width of the main lobe just represents the central frequency, but it's the width of the envelope that is inversely proportional to the BW. Obviously, for narrowband sensors, where the PSF has a strong ringing effect with many cycles, the axial resolution is always determined by the envelope width. See for example, 10.1002/jbio.201800357. This is true also for any textbook on ultrasound. Accordingly, I suggest updating the number calculated for the axial resolution.
Reviewer #2 (Remarks to the Author): The revision looks good, and thanks for the detailed responses from the authors. If the authors could help address the following comments, I'd be happy to recommend acceptance.
remains around the level of 0.12 mW to mitigate thermal instability and nonlinear effects. In this condition, each comb tooth can theoretically utilize only 1/N of the total power, where N (=1536 in this study) is the number of comb teeth used in the DOFC. As a result, the considerably reduced power per each comb tooth is the compromise to enable parallel interrogation. To clarify this point, we added the following sentences in the section "Characterization of the micro-ring sensor" of the revised main text: Compared to the one characterized above, the increased NEPs here are due to the evenly distributed power into each comb tooth (1/1536 theoretically), which is a compromise to enable parallel interrogation.

4) How much averaging was used in the imaging experiments?
Response: We thank the reviewer for asking this important question. In all imaging experiments with biological samples, these samples were rotated with a step size of 1 degree. No additional averaging is required. To clarify this issue, we added the following sentence in the section "Experimental result on imaging biological samples" of the revised main text: No additional averaging was required for for imaging these biological samples. Fig. 7, it would help to show also the signal for the zero angle, which will illustrate how much signal loss there is between angle zero and 10 degrees.

Response:
We thank the reviewer for this valuable suggestion. In Supplementary Note 9 of the revised Supplementary Information, we modified Supplementary Fig. 8 by including the signals at zero angles. The corresponding descriptions on the labelings were adjusted accordingly. Frequency responses of the micro-ring sensor with an acceptance angle at 0°, 10°, 20°, 30°, and 40°, respectively. Fig. 7h  due to limited view, or reduced BW as a function of angle) that would be even better. This is really a promising device that could inspire others in the field, and it would be very useful to understand its limitations as well.

Response:
We thank the reviewer for raising this important issue. In Fig. 7(h) of the original submission, the negative values were directly thresholded out. We agree with the reviewer that these artifacts are a natural part of any PAT system, and thus we modified this figure by keeping these negative values and explicitly showed the artifacts in the revised submission. By keeping negative values, the axial resolution in the revised Fig. 7(j) was re-quantified by using the envelope of the line graph, which exhibited to be 43.6 μm. As suggested by the reviewer, we added the following sentences in the section "Characterization of the micro-ring sensor" of the revised main text: One-dimensional profiles for quantifying both the lateral and axial resolutions are illustrated in Figs. 7(i) and (j), and their envelopes exhibiting full width at half maximum of about 50.4 μm and 43.6 μm, respectively. These curves have similar shapes to the ones reported in the literature [19,34,48,49] and the ones simulated numerically (detailed in Supplementary Note 12).
Moreover, as suggested by the reviewer, we used k-wave (Version 1.2.1) to simulate this condition and compare it to the experimentally achieved results. In the revised Supplementary Information, we added a new section "Supplementary Note 12" to describe the simulation results.

Supplementary Note 12 Numerical simulations on the point spread function (PSF) of the imaging system
In this section, we employed a numerical tool (k-wave, Version 1.2.1) to simulate the PSF of the imaging system, which manifests the artifacts due to limited view and reduced bandwidth as a function of angles. The simulation is in a two-dimensional plane and the geometrical parameters adopted in this simulation are identical to the experimental conditions presented in the main text. Furthermore, we used the data presented in Fig. 7(e) of the main text to confine the angular response during simulations. Simulation results along both lateral and axial directions are shown in Supplementary Figs. 12(a) and (b), exhibiting similar trends as the ones obtained experimentally in Figs. 7(i) and (j), respectively. Moreover, the determined lateral and axial resolutions from experiments and simulations are found to be quantitively close. The reviewer also suggested providing a three-dimensional PSF. In this work, two-dimensional PSF was characterized in Fig. 7(g) by imaging the cross-section of a carbon fiber. Since we focus on achieving two-dimensional images with a linear array, we believe two-dimensional PSF can well characterize the imaging system. In general, as the reviewer suggested, three-dimensional PSF could be achieved as well by imaging a point object and scanning the sensor array along the third direction. Nonetheless, we anticipate that the two directions on the transverse plane should exhibit similar behaviors. Due to these reasons, we still keep the present two-dimensional PSF in Fig. 7(h) for convenience.

7)
The authors show that the tuning of the resonance is stable for at least 6 hours and also is stable against temperature variations. That is an excellent result and shows that this approach is compatible with long imaging sessions in a lab environment. However, it is still not clear whether one would need to tune the resonators before each experiment, or could just perform the tuning as part of the fabrication process and continue using the device for days/weeks/months. This is a crucial point about the type of applications this technique is currently useful for, and the possible future need for more stable tuning procedures. Also, if the stability is not too long, it opens up a new challenge for material scientists to find solutions to the drift problem.

Response:
We thank the reviewer for raising this important question. Although we demonstrated the resonance can be stable for at least 6 hours, it generally cannot be stable for days, weeks, or even months. Thus, for experiments on different days, we still need to tune the sensor array before each experiment. Nonetheless, we anticipate such a problem can be possibly mitigated in the future by adjusting the chemical composition of the chalcogenide glass. To clarify this issue, we added the following sentences in Supplementary Note 14 of the revised Supplementary Information: However, for experiments on different days, we still need to tune the resonance spectrum of the sensor array before each experiment. For future applications that require long stability, we anticipate that the photosensitive effect can be possibly mitigated by adjusting the chemical composition of the chalcogenide glass, which is out of the scope of this work.

Response:
We thank the reviewer for helping us explain why the experimentally achieved resolutions are worse than the theoretical values. Following the reviewer's suggestion, we added a short discussion in the section "Characterization of the micro-ring sensor" of the revised main text to explain the discrepancy between the experimental values and theoretical predictions: (a) The small discrepancy between the estimated value (35.5 μm) and the measured one (50.4 μm) is because f0 = 60 MHz applies for only small acceptance angles so that the average central frequency over the entire angular range is smaller. (b) This observation originates from the fact that the bandwidth reduces considerably for large acceptance angles so that the average effective bandwidth over the entire angular range is much smaller. Moreover, the fact that high-frequency components attenuate much more than lowfrequency counterparts in agar that covers the carbon fiber might also contribute.

9)
Conventionally, when one wishes to quantify the resolution (or more precisely, the width of the PSF) one does that on the envelope of the reconstruction and not only on the positive values. The width of the main lobe just represents the central frequency, but it's the width of the envelope that is inversely proportional to the BW. Obviously, for narrowband sensors, where the PSF has a strong ringing effect with many cycles, the axial resolution is always determined by the envelope width. See for example, 10.1002/jbio.201800357. This is true also for any textbook on ultrasound. Accordingly, I suggest updating the number calculated for the axial resolution.